Efficient similarity-based operations for data integration
نویسندگان
چکیده
منابع مشابه
Efficient similarity-based operations for data integration
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related...
متن کاملUsing Similarity-Based Operations for Resolving Data-Level Conflicts
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related...
متن کاملData integration based on similarity matrices
Consider a classification task where the data are represented with n different views (feature sets) associated with the same output variable. The standard approach to this task is to train a classifier for each view and then to combine all the resulting n classifiers using a type of majority voting rule. Another option is to simply concatinate all views into one (huge) dataset and perform analy...
متن کاملExtensible and Similarity-based Grouping for Data Integration
The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we ...
متن کاملAn efficient method for obtaining similarity data
Copyright 1994 Psychonomic Society, Inc. I thank Herman Gollwitzer, Douglas Medin, Robert Nosofsky, and Richard Shiffrin for many useful comments. The research was supported by a Biomedical Research Grant from the National Institute of Health (PHS S07RR7031N) and by National Science Foundation Grant SBR-9409232. Correspondence should be addressed to R. Goldstone, Psychology Department, Indiana ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data & Knowledge Engineering
سال: 2004
ISSN: 0169-023X
DOI: 10.1016/j.datak.2003.08.004